The file titled US Electricity.csv includes a time series index compiled by the US Federal Reserve representing total fossil-fuel US electricity generation by all utilities from January 1939 through October 2021.
In the following code box we read the CSV file and set up the data as a tsibble and then we plot it and subset it to examine it.
#clear environment
rm(list = ls())
library(fpp3)
D <- read.csv("US Electricity.csv") %>%
mutate(DATE = yearmonth(DATE)) %>%
as_tsibble(index = DATE)
D %>% autoplot(ELEC)
DR <- D %>% filter(DATE >= yearmonth("2010 Jan"))
DR %>% autoplot(ELEC)
We are interested in developing a two-year long monthly forecast (24 months) for the national electricity production requirements.
#stationarity of DR - not stationary, need to add seasonality
#ACF and PACF
DR %>% gg_tsdisplay(ELEC, plot_type = "partial") +
labs(title="DR data", y="")
#add seasonality
DR.12 <- DR %>% gg_tsdisplay(difference(ELEC, 12),
plot_type='partial', lag=60) +
labs(title="Seasonally differenced - 12", y="")
DR.12
Warning: Removed 12 row(s) containing missing values (geom_path).
Warning: Removed 12 rows containing missing values (geom_point).
DR.4 <- DR %>% gg_tsdisplay(difference(ELEC, 4),
plot_type='partial', lag=60) +
labs(title="Seasonally differenced - 4", y="")
DR.4
Warning: Removed 4 row(s) containing missing values (geom_path).
Warning: Removed 4 rows containing missing values (geom_point).
NA
NA
NA
DR is not stationary, all of the lags are significant lags in the ACF plot. We need to figure out how many differences to take until it is stationary. PACF shows only 2 very large significant lags at the beginning. Because the ACF is complex, and the PACF dies down slowly, I believe this a Moving Average forecasting model would best suite this data. It also looks like the ACF is showing some seasonality in the lags, as there is a pattern to the spikes. This makes sense as your electricity usage changes as the weather changes.
The PACF dies down so we will identify this model as a MA(1) or MA(2) seeing if seasonal MA or regular MA results in a better arima model. I think it could possibly be seasonality 4, rather than seasonality 12 as well. 1) ARIMA(1,1,1)(1,1,1) [12] 2) ARIMA(1,1,2)(1,1,1) [12] 3) ARIMA(1,1,1)(1,1,2) [12]
#how many differences it needs
DR %>% features(ELEC, unitroot_nsdiffs) #seasonal differences needed = 1
DR %>% features(ELEC,unitroot_ndiffs) #differences needed = 0
fitDR <- DR %>%
model( #none of the models worked with d = 0 - HELP
arima1 = ARIMA(ELEC ~ pdq(1,1,1) + PDQ(1,1,1)),
arima2 = ARIMA(ELEC ~ pdq(1,1,2) + PDQ(1,1,1)),
arima3 = ARIMA(ELEC ~ pdq(1,1,1) + PDQ(1,1,2)),
auto.arima = ARIMA(ELEC), #100210
auto.ETS = ETS(ELEC)) #error("M") + trend("N") + season("A")
fitDR %>% pivot_longer(everything(), names_to = "Model name",
values_to = "Orders")
glance(fitDR) %>% arrange(AICc) %>% select(.model:BIC)
fitDR %>% augment() %>%
features(.resid, ljung_box, lag = 60)
fitDR %>% select(arima1) %>% gg_tsresiduals()
fitDR %>% select(arima2) %>% gg_tsresiduals()
fitDR %>% select(arima3) %>% gg_tsresiduals()
fitDR %>% select(auto.arima) %>% gg_tsresiduals()
fitDR %>% select(auto.ETS) %>% gg_tsresiduals()
There is not a validity issue with any of the models, possibly the ETS model could be invalid because it has 2 significant ACF lags while the other models do not. They all have normally distributed residuals centered around 0, and ACFs with mostly nonsignificant lags.
For model cross-validation purposes stretch the DR data as follows:
fitDR %>% accuracy() %>% select(.model, MAPE, RMSE, MAE)
fitDR %>% glance() %>%
select(.model, AIC, AICc, BIC)
Best 2 arima Models = arima3 ARIMA(1,1,1)(1,1,2) [12] and arima1 ARIMA(1,1,1)(1,1,1) [12] ETS model = Elec ~ error(“M”) + trend(“N”) + season(“A”)
#3 or 4 models??
#kinda takes awhile
#given code
DR.CV <- DR %>%
filter(DATE >= yearmonth("2010 Jan")) %>%
stretch_tsibble(.init = 36, .step = 1)
mC <- DR.CV %>%
model(
arima1 = ARIMA(ELEC ~ pdq(1,1,1) + PDQ(1,1,1)),
arima2 = ARIMA(ELEC ~ pdq(1,1,2) + PDQ(1,1,1)),
arima3 = ARIMA(ELEC ~ pdq(1,1,1) + PDQ(1,1,2)),
ETS(ELEC ~ error("M") + trend("N") + season("A")))
Warning in sqrt(diag(best$var.coef)) : NaNs produced
Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.
Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.
Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.
Warning in sqrt(diag(best$var.coef)) : NaNs produced
Warning in sqrt(diag(best$var.coef)) : NaNs produced
Warning in sqrt(diag(best$var.coef)) : NaNs produced
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.
Warning: 3 errors (1 unique) encountered for arima2
[3] Could not find an appropriate ARIMA model.
This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots
Warning: 37 errors (1 unique) encountered for arima3
[37] Could not find an appropriate ARIMA model.
This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots
fCV <- mC %>%
forecast(h = 24) %>% #for each of the 4 model IDs you have 24 forecasts
group_by(.id, .model) %>% #ID of one corresponds to 3 models
mutate(h = row_number()) %>% #create variable h for the .id #
ungroup() #forecast cross validation
fCV %>%
accuracy(D, by = c("h", ".model")) %>% #shows you all the CV metrics
ggplot(aes(x = h, y = MAPE, color = .model)) +
geom_line()
Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing.
24 observations are missing between 2021 Oct and 2023 Sep
fCV %>%
accuracy(D, by = c("h", ".model")) %>%
ggplot(aes(x = h, y = RMSE, color = .model)) +
geom_line()
Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing.
24 observations are missing between 2021 Oct and 2023 Sep
The models with the lowest RMSE are the arima3 model ARIMA(1,1,1)(1,1,2) [12] and the auto selected ETS model. The arima3 model had the lowest AIC value, however the ETS model had the highest AIC so it is interesting to see it with such a low RMSE value for the 25 forecasted values. While the ETS model may not be the best at capturing the time series data, it is one of the best, among our options, for forecasting for future periods.
DR.CV <- DR %>%
filter(DATE >= yearmonth("2010 Jan")) %>%
stretch_tsibble(.init = 36, .step = 1)
DR.CV %>%
model(
ARIMA(ELEC ~ pdq(0,1,2) + PDQ(0,1,2)),
ETS(ELEC ~ error("M") + trend("N") + season("A"))) %>%
forecast(h = 24) %>%
accuracy(DR) %>%
select(.model, RMSE:MAPE)
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning in max(PDQ$Q) : no non-missing arguments to max; returning -Inf
Warning: 36 errors (1 unique) encountered for ARIMA(ELEC ~ pdq(0, 1, 2) + PDQ(0, 1, 2))
[36] Could not find an appropriate ARIMA model.
This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots
Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing.
24 observations are missing between 2021 Oct and 2023 Sep
#how to look at correlation of the two? - gg_residuals?
fitDR %>%
select(arima3) %>%
forecast(h = 24) %>%
autoplot(DR)
fitDR %>%
select(arima3) %>%
forecast(h = 24)
m <- fitDR %>%
select(arima3) %>%
forecast(h = 24)
summary(m)
Error: `summary.yearmonth()` not implemented.
Run `rlang::last_error()` to see where the error occurred.
The 95% confidence interval for Electricity usage for the next 24 months is between 80 and 125.